Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Online RLHF Optimization
# Online RLHF Optimization
Llama 3 8B SFR Iterative DPO R
An instruction-optimized model based on Llama-3-8B, trained with iterative DPO reinforcement learning, outperforming same-scale and some larger models in multiple benchmarks
Large Language Model
Transformers
L
Salesforce
55
78
Featured Recommended AI Models
Empowering the Future, Your AI Solution Knowledge Base
English
简体中文
繁體中文
にほんご
© 2025
AIbase